fix(tests): eliminate test-api shard cross-test contamination (vchord cache, tenant schemas, maintenance routine TOCTOU) by nicoloboschi · Pull Request #2310 · vectorize-io/hindsight

nicoloboschi · 2026-06-19T10:42:00Z

Summary

Fixes the recurring test-api shard failures (it had been red across re-runs and blocking every code PR's CI signal). Three independent cross-test contamination / isolation bugs were stacked in the test-api (2/3) shard, each surfacing as a deterministic-looking "infra flake". This PR fixes all three.

1. `vchord` config-cache leak (the dominant failure — dozens of tests)

test_link_utils.py's ANN tests monkeypatch HINDSIGHT_API_VECTOR_EXTENSION (one sets vchord). That value is read through the process-global config cache (get_config()), and monkeypatch reverts only the env var on teardown — not the cache. Once get_config() caches vchord, every later bank-creating test on that xdist worker builds per-bank indexes with USING vchordrq against the pgvector-only test DB and fails with:

asyncpg.exceptions.UndefinedObjectError: access method "vchordrq" does not exist

Fix: an autouse fixture on TestComputeSemanticLinksAnnPgBouncerSafety that clears the config cache before/after each test (same pattern as test_worker_retry_knobs.py).

2. Half-built tenant schemas in `test_maintenance_multitenant`

It provisioned 100 tenant schemas with autocommitted CREATE SCHEMA + per-table CREATE TABLE, leaving a window where a schema existed with only some tables.

Fix: wrap provisioning in a single transaction so schemas appear atomically.

3. TOCTOU in the maintenance routines (latent prod bug)

public.banks_needing_consolidation() / public.schemas_with_expired_rows() snapshot the schemas owning a target table from pg_class, then run a dynamic query against each. A schema dropped between snapshot and query (a tenant being deleted/migrated, or the multi-tenant test's teardown running concurrently with test_maintenance_routines) aborts the whole routine:

relation "<schema>.memory_units" does not exist
relation "<schema>.audit_log" does not exist

Fix: forward migration c7e9f1a3b5d2 redefines both routines (CREATE OR REPLACE, public/base-run gated, PG-only) so each per-schema query runs in its own subtransaction that skips the schema on undefined_table / invalid_schema_name / undefined_column. Adds a deterministic regression test (schema with memory_units but no banks table).

Verification

vchord repro flips 6 failed → 7 passed; test_link_utils.py 40 passed.
Maintenance pair (test_maintenance_routines + test_maintenance_multitenant) green; new resilience regression test passes.
Migration: single head, test_migration_shape.py green, applies cleanly (e1f2a3b4c5d6 → c7e9f1a3b5d2).
ruff/ty clean (tests excluded from ruff lint per repo config).

Note: test-doc-examples (cli) failures seen on earlier runs are an unrelated Gemini real-LLM timeout / 400 INVALID_ARGUMENT flake, not touched here.

… stop cross-test contamination The ANN tests in test_link_utils.py monkeypatch HINDSIGHT_API_VECTOR_EXTENSION (e.g. to "vchord"). That env var is read through the process-global config cache (get_config()), and monkeypatch reverts only the env var on teardown — not the cache. Once get_config() caches "vchord", it persists for the rest of the xdist worker. Every subsequent bank-creating test on that worker then builds per-bank vector indexes with `USING vchordrq` against the pgvector-only test DB and fails with: asyncpg.exceptions.UndefinedObjectError: access method "vchordrq" does not exist cascading across dozens of unrelated tests in the test-api shard (test_list_documents, test_maintenance_routines, test_mental_models, test_observations, ...). Because the leak depends on which worker first populates the cache, the failure looked like a flaky, shard-specific infra problem. Fix: add an autouse fixture to the class that clears the config cache before and after each test, so the cache is rebuilt from the current env per test and "vchord" can't leak out.

test_maintenance_multitenant provisions 100 tenant schemas by running CREATE SCHEMA + 5×CREATE TABLE per schema. Each statement autocommitted, so there was a window where a schema existed with only some of its tables. The global maintenance routines (public.schemas_with_expired_rows / banks_needing_consolidation) discover schemas by table presence and are exercised concurrently by test_maintenance_routines on another xdist worker against the shared test DB. They would query a not-yet-created table in a half-built schema and fail with: asyncpg.exceptions.UndefinedTableError: relation "mt<hash>_NNN.memory_units" does not exist Wrap the whole provisioning in a single transaction so the schemas become visible to other connections only once fully built.

…utines public.banks_needing_consolidation() and public.schemas_with_expired_rows() snapshot the schemas owning a target table from pg_class, then run a dynamic query against each schema in turn. That is a TOCTOU race: a schema (or its tables) can be dropped between the snapshot and the per-schema query — a tenant being deleted, a tenant migration recreating tables, or (in the test suite) the multi-tenant maintenance test creating/dropping ~100 schemas concurrently with test_maintenance_routines on the shared DB. The query then aborts the whole routine with: relation "<schema>.memory_units" does not exist relation "<schema>.audit_log" does not exist Forward migration c7e9f1a3b5d2 redefines both routines (CREATE OR REPLACE, public/base-run gated, PG-only) so each per-schema query runs in its own subtransaction that skips the schema on undefined_table / invalid_schema_name / undefined_column instead of failing the scan. Adds a deterministic regression test (schema with memory_units but no banks table) for the skip path.

…op chunks-mode leak test_memory_defense._make_minimal_engine() builds a MemoryEngine inside a patch.dict that sets HINDSIGHT_API_LLM_PROVIDER=none. Constructing the engine calls get_config(), repopulating the process-global config cache from the patched env — and provider="none" forces retain_extraction_mode="chunks". When patch.dict restores the env, the cache still holds the "none"/chunks config. It then leaks to every later test on the same xdist worker: their retains run in chunks mode (raw text, NO entity extraction), so unrelated assertions fail — notably the test_observations entity tests ("John/Alice/Nexora entity should exist"), which presented as a flaky, shard-specific failure (whichever entity test landed on the poisoned worker). Drop the config cache after the patched env is restored so the next get_config() rebuilds from the real env. Reproduced deterministically: pytest test_memory_defense.py::test_engine_memory_defense_shares_ext_ctx \ test_observations.py::test_entity_extraction_on_retain # before: entity test FAILED (Insert unit_entities: 0 pairs) # after: passed

nicoloboschi added 3 commits June 19, 2026 12:41

nicoloboschi changed the title ~~fix(tests): reset config cache after vchord tests to stop test-api shard contamination~~ fix(tests): eliminate test-api shard cross-test contamination (vchord cache, tenant schemas, maintenance routine TOCTOU) Jun 19, 2026

nicoloboschi merged commit af42382 into main Jun 19, 2026
98 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(tests): eliminate test-api shard cross-test contamination (vchord cache, tenant schemas, maintenance routine TOCTOU)#2310

fix(tests): eliminate test-api shard cross-test contamination (vchord cache, tenant schemas, maintenance routine TOCTOU)#2310
nicoloboschi merged 4 commits into
mainfrom
fix/test-api-vchordrq-shard-contamination

nicoloboschi commented Jun 19, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nicoloboschi commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

1. vchord config-cache leak (the dominant failure — dozens of tests)

2. Half-built tenant schemas in test_maintenance_multitenant

3. TOCTOU in the maintenance routines (latent prod bug)

Verification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

nicoloboschi commented Jun 19, 2026 •

edited

Loading

1. `vchord` config-cache leak (the dominant failure — dozens of tests)

2. Half-built tenant schemas in `test_maintenance_multitenant`